bayesian model averaging
Dangers of Bayesian Model Averaging under Covariate Shift
Approximate Bayesian inference for neural networks is considered a robust alternative to standard training, often providing good performance on out-of-distribution data. However, Bayesian neural networks (BNNs) with high-fidelity approximate inference via full-batch Hamiltonian Monte Carlo achieve poor generalization under covariate shift, even underperforming classical estimation. We explain this surprising result, showing how a Bayesian model average can in fact be problematic under covariate shift, particularly in cases where linear dependencies in the input features cause a lack of posterior contraction. We additionally show why the same issue does not affect many approximate inference procedures, or classical maximum a-posteriori (MAP) training. Finally, we propose novel priors that improve the robustness of BNNs to many sources of covariate shift.
Input Adaptive Bayesian Model Averaging
Slavutsky, Yuli, Salazar, Sebastian, Blei, David M.
This paper studies prediction with multiple candidate models, where the goal is to combine their outputs. This task is especially challenging in heterogeneous settings, where different models may be better suited to different inputs. We propose input adaptive Bayesian Model Averaging (IA-BMA), a Bayesian method that assigns model weights conditional on the input. IA-BMA employs an input adaptive prior, and yields a posterior distribution that adapts to each prediction, which we estimate with amortized variational inference. We derive formal guarantees for its performance, relative to any single predictor selected per input. We evaluate IABMA across regression and classification tasks, studying data from personalized cancer treatment, credit-card fraud detection, and UCI datasets. IA-BMA consistently delivers more accurate and better-calibrated predictions than both non-adaptive baselines and existing adaptive methods. Many applications require adaptive predictions. In personalized medicine, different patients respond differently to the same treatment (Mahajan et al., 2023); in fairness-sensitive domains, predictions need to adapt to subpopulations (Wang et al., 2019; Grother et al., 2019); and in fraud detection, behavioral data is often heteroskedastic and varies substantially across inputs (V armedja et al., 2019).
- Asia > Middle East > Jordan (0.04)
- North America > United States > California (0.04)
- Law Enforcement & Public Safety > Fraud (0.89)
- Health & Medicine > Therapeutic Area > Oncology (0.86)
Dangers of Bayesian Model Averaging under Covariate Shift
Approximate Bayesian inference for neural networks is considered a robust alternative to standard training, often providing good performance on out-of-distribution data. However, Bayesian neural networks (BNNs) with high-fidelity approximate inference via full-batch Hamiltonian Monte Carlo achieve poor generalization under covariate shift, even underperforming classical estimation. We explain this surprising result, showing how a Bayesian model average can in fact be problematic under covariate shift, particularly in cases where linear dependencies in the input features cause a lack of posterior contraction. We additionally show why the same issue does not affect many approximate inference procedures, or classical maximum a-posteriori (MAP) training. Finally, we propose novel priors that improve the robustness of BNNs to many sources of covariate shift.
Dangers of Bayesian Model Averaging under Covariate Shift
Approximate Bayesian inference for neural networks is considered a robust alternative to standard training, often providing good performance on out-of-distribution data. However, Bayesian neural networks (BNNs) with high-fidelity approximate inference via full-batch Hamiltonian Monte Carlo achieve poor generalization under covariate shift, even underperforming classical estimation. We explain this surprising result, showing how a Bayesian model average can in fact be problematic under covariate shift, particularly in cases where linear dependencies in the input features cause a lack of posterior contraction. We additionally show why the same issue does not affect many approximate inference procedures, or classical maximum a-posteriori (MAP) training. Finally, we propose novel priors that improve the robustness of BNNs to many sources of covariate shift.
Beyond Bayesian Model Averaging over Paths in Probabilistic Programs with Stochastic Support
Reichelt, Tim, Ong, Luke, Rainforth, Tom
The posterior in probabilistic programs with stochastic support decomposes as a weighted sum of the local posterior distributions associated with each possible program path. We show that making predictions with this full posterior implicitly performs a Bayesian model averaging (BMA) over paths. This is potentially problematic, as model misspecification can cause the BMA weights to prematurely collapse onto a single path, leading to sub-optimal predictions in turn. To remedy this issue, we propose alternative mechanisms for path weighting: one based on stacking and one based on ideas from PAC-Bayes. We show how both can be implemented as a cheap post-processing step on top of existing inference engines. In our experiments, we find them to be more robust and lead to better predictions compared to the default BMA weights.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > New York > New York County > New York City (0.14)
- North America > United States > California (0.04)
- (4 more...)
- Information Technology > Software > Programming Languages (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Automating Model Comparison in Factor Graphs
van Erp, Bart, Nuijten, Wouter W. L., van de Laar, Thijs, de Vries, Bert
The famous aphorism of George Box states: "all models are wrong, but some are useful" [1]. It is the task of statisticians and data analysts to find a model which is most useful for a given problem. The build, compute, critique and repeat cycle [2], also known as Box's loop [3], is an iterative approach for finding the most useful model. Any efforts in shortening this design cycle increase the chances of developing more useful models, which in turn might yield more reliable predictions, more profitable returns or more efficient operations for the problem at hand. In this paper we choose to adopt the Bayesian formalism and therefore we will specify all tasks in Box's loop as principled probabilistic inference tasks. In addition to the well-known parameter and state inference tasks, the critique step in the design cycle is also phrased as an inference task, known as Bayesian model comparison, which automatically embodies Occam's razor [4, Ch. 28.1]. Opposed to just selecting a single model in the critique step, for different models we better quantify our confidence about which model is best, especially when data is limited [5, Ch. 18.5.1]. The uncertainty arising from prior beliefs p(m) over a set of models m and limited observations can be naturally included through the use of Bayes' theorem
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Netherlands > North Brabant > Eindhoven (0.04)
- (9 more...)
Target Identification and Bayesian Model Averaging with Probabilistic Hierarchical Factor Probabilities
Target detection in hyperspectral imagery is the process of locating pixels from an image which are likely to contain target, typically done by comparing one or more spectra for the desired target material to each pixel in the image. Target identification is the process of target detection incorporating an additional process to identify more specifically the material that is present in each pixel that scored high in detection. Detection is generally a 2-class problem of target vs. background, and identification is a many class problem including target, background, and additional know materials. The identification process we present is probabilistic and hierarchical which provides transparency to the process and produces trustworthy output. In this paper we show that target identification has a much lower false alarm rate than detection alone, and provide a detailed explanation of a robust identification method using probabilistic hierarchical classification that handles the vague categories of materials that depend on users which are different than the specific physical categories of chemical constituents. Identification is often done by comparing mixtures of materials including the target spectra to mixtures of materials that do not include the target spectra, possibly with other steps. (band combinations, feature checking, background removal, etc.) Standard linear regression does not handle these problems well because the number of regressors (identification spectra) is greater than the number of feature variables (bands), and there are multiple correlated spectra. Our proposed method handles these challenges efficiently and provides additional important practical information in the form of hierarchical probabilities computed from Bayesian model averaging.
- North America > United States > Virginia > Albemarle County > Charlottesville (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.85)
On the Effectiveness of Mode Exploration in Bayesian Model Averaging for Neural Networks
Holodnak, John T., Wollaber, Allan B.
Multiple techniques for producing calibrated predictive probabilities using deep neural networks in supervised learning settings have emerged that leverage approaches to ensemble diverse solutions discovered during cyclic training or training from multiple random starting points (deep ensembles). However, only a limited amount of work has investigated the utility of exploring the local region around each diverse solution (posterior mode). Using three well-known deep architectures on the CIFAR-10 dataset, we evaluate several simple methods for exploring local regions of the weight space with respect to Brier score, accuracy, and expected calibration error. We consider both Bayesian inference techniques (variational inference and Hamiltonian Monte Carlo applied to the softmax output layer) as well as utilizing the stochastic gradient descent trajectory near optima. While adding separate modes to the ensemble uniformly improves performance, we show that the simple mode exploration methods considered here produce little to no improvement over ensembles without mode exploration.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > France (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
Dangers of Bayesian Model Averaging under Covariate Shift
Izmailov, Pavel, Nicholson, Patrick, Lotfi, Sanae, Wilson, Andrew Gordon
Approximate Bayesian inference for neural networks is considered a robust alternative to standard training, often providing good performance on out-of-distribution data. However, Bayesian neural networks (BNNs) with high-fidelity approximate inference via full-batch Hamiltonian Monte Carlo achieve poor generalization under covariate shift, even underperforming classical estimation. We explain this surprising result, showing how a Bayesian model average can in fact be problematic under covariate shift, particularly in cases where linear dependencies in the input features cause a lack of posterior contraction. We additionally show why the same issue does not affect many approximate inference procedures, or classical maximum a-posteriori (MAP) training. Finally, we propose novel priors that improve the robustness of BNNs to many sources of covariate shift.
Bayesian Model Averaging for Data Driven Decision Making when Causality is Partially Known
Papamichalis, Marios, Ray, Abhishek, Bilionis, Ilias, Kannan, Karthik, Krishnamurthy, Rajiv
Probabilistic machine learning models are often insufficient to help with decisions on interventions because those models find correlations - not causal relationships. If observational data is only available and experimentation are infeasible, the correct approach to study the impact of an intervention is to invoke Pearl's causality framework. Even that framework assumes that the underlying causal graph is known, which is seldom the case in practice. When the causal structure is not known, one may use out-of-the-box algorithms to find causal dependencies from observational data. However, there exists no method that also accounts for the decision-maker's prior knowledge when developing the causal structure either. The objective of this paper is to develop rational approaches for making decisions from observational data in the presence of causal graph uncertainty and prior knowledge from the decision-maker. We use ensemble methods like Bayesian Model Averaging (BMA) to infer set of causal graphs that can represent the data generation process. We provide decisions by computing the expected value and risk of potential interventions explicitly. We demonstrate our approach by applying them in different example contexts.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > California (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Transportation > Air (0.93)
- Health & Medicine > Therapeutic Area (0.69)